Using a multi-staged strategy based on machine learning and mathematical modeling to predict genotype-phenotype risk patterns in diabetic kidney disease: a prospective case–control cohort analysis
نویسندگان
چکیده
BACKGROUND Multi-causality and heterogeneity of phenotypes and genotypes characterize complex diseases. In a database with comprehensive collection of phenotypes and genotypes, we compared the performance of common machine learning methods to generate mathematical models to predict diabetic kidney disease (DKD). METHODS In a prospective cohort of type 2 diabetic patients, we selected 119 subjects with DKD and 554 without DKD at enrolment and after a median follow-up period of 7.8 years for model training, testing and validation using seven machine learning methods (partial least square regression, the classification and regression tree, the C5.0 decision tree, random forest, naïve Bayes classification, neural network and support vector machine). We used 17 clinical attributes and 70 single nucleotide polymorphisms (SNPs) of 54 candidate genes to build different models. The top attributes selected by the best-performing models were then used to build models with performance comparable to those using the entire dataset. RESULTS Age, age of diagnosis, systolic blood pressure and genetic polymorphisms of uteroglobin and lipid metabolism were selected by most methods. Models generated by support vector machine (svmRadial) and random forest (cforest) had the best prediction accuracy whereas models derived from naïve Bayes classifier and partial least squares regression had the least optimal performance. Using 10 clinical attributes (systolic and diastolic blood pressure, age, age of diagnosis, triglyceride, white blood cell count, total cholesterol, waist to hip ratio, LDL cholesterol, and alcohol intake) and 5 genetic attributes (UGB G38A, LIPC -514C > T, APOB Thr71Ile, APOC3 3206T > G and APOC3 1100C > T), selected most often by SVM and cforest, we were able to build high-performance models. CONCLUSIONS Amongst different machine learning methods, svmRadial and cforest had the best performance. Genetic polymorphisms related to inflammation and lipid metabolism warrant further investigation for their associations with DKD.
منابع مشابه
Hybrid Method of Logistic Regression and Data Envelopment Analysis for Event Prediction: A Case Study (Stroke Disease)
Abstract Predictive analytics is an area of statistics that deals with extracting information from data and using it to predict trends and behavior patterns. Many mathematical modeling has been developed and used for prediction, and in some cases, they have been found to be very strong and reliable. This paper studies different mathematical and statistical approaches for events prediction. The ...
متن کاملPatterns of Changes in Abdominal Obesity Indices in Prediabetic Individuals: Results of a 16-year Prospective Cohort Study among First-degree Relatives of Type 2 Diabetic Patients
Introduction: Previous studies have not investigated the association of concomitant changes in obesity indicators with diabetes in prediabetic patients. This study aimed to identify the patterns of changes in the abdominal obesity indices over time in prediabetic patients and to predict high-risk individuals for the future risk of diabetes development. Materials and Methods: This prospective 16...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کاملThe Comparison of Major Dietary Patterns in People with and without Calcium Oxalate Kidney Stone: A Case-Control Study
Background: It was suggested that dietary patterns might play a role in the pathogenesis of nephrolithiasis. The aim of this study was to determine the relationship between dietary patterns and the occurrence of calcium oxalate kidney stone disease. Methods: A case-control study was conducted on 634 male and female participants aged 18-65 in Tehran using a convenient sampling method. The partic...
متن کاملModeling and analysis of leishmaniasis distribution process using multilayer perceptron neural network and support vector regression (Case study: villages of Isfahan province)
Villages located in Isfahan province are one of the areas prone to the spread of cutaneous leishmaniasis, which is characterized by the occurrence of wounds on the skin. To predict the future prevalence of cutaneous leishmaniasis, Continuous monitoring of the spatial distribution of this disease is essential. Disease modeling was performed using two machine learning algorithms called support ve...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 14 شماره
صفحات -
تاریخ انتشار 2013